first attempt at mcstats implementation by ammitra · Pull Request #12 · JHU-Tools/2DAlphabet

ammitra · 2026-06-03T03:27:35Z

A first attempt at implementing automatic MC statistical uncertainties, à la Combine.

I followed the Combine implementation outlined here.

Combine handles automatic MC stats by using text2workspace.py, which parses the datacard and finds all TH1 objects and creates a RooRealVar object for all param objects in the card. These objects get attached a Gaussian constraint and are treated as a nuisance parameter. Notably, this only works for RooWorkspace objects that do not contain RooDataHists. This disqualifies 2DAlphabet workspaces from using automcstats, since we use 2D RooDataHists.

I think that enabling something like what Combine does using params in the datacard would necessitate a very difficult re-write of the Combine code, so this implementation uses a more naive approach. During workspace creation, when building the OrganizedHist object, 2DAlphabet will now loop over all nominal background histograms in each region (signal can be included as well, though I don't see why one would want to). The algorithm follows Combine's implementation:

Sum the yields $n_i$ and uncertainties $e_i$ of each background process $i$ in bin (bx, by) of the 2D histogram.
Calculate $n_\text{tot} = \sum_{i\in \text{bkg}} n_i$, and $e_\text{tot} = \sqrt{\sum_{i\in \text{bkg}} e_i^2}$
If $e_\text{tot}= 0 $, the bin is skipped and no Up/Down $\pm 1\sigma$ TH2s are created
Calculate the effective number of unweighted events $n_\text{tot}^\text{eff} = n_\text{tot}^2 / e_\text{tot}^2$
If $n_\text{tot}^\text{eff} \leq n^\text{threshold}$, separate uncertainties are created per-process
- NOTE: Combine creates a Poisson-constrained parameter. We can't really do this with 2DA, so instead we just create $\pm 1\sigma$ templates (where $-1\sigma$ is bounded below at 0), and then Combine automatically assigns these as nuisance parameters with Gaussian constraints.
If $n_\text{tot}^\text{eff} > n^\text{threshold}$, then we create a single Gaussian-constrained Barlow-Beeston-lite template that will scale the total yield in the bin.
- NOTE: I do this by ensuring that every process in this bin shares the same nuisance parameter (histogram) name, so that every background process is controlled by the same parameter in Combine. See below for some reasoning.

The resulting histograms are identical to the nominal background histograms except for the given bin, which is shifted up or down by some scaling factor governed by the number of events.

Case 1: $n_\text{tot}^\text{eff} > n^\text{threshold}$

In this case, we are creating a single nuisance parameter that scales the total yield in the bin. To accomplish this, I use the following logic:

Each process $i$ in this bin gets its nominal yield $n_i$ scaled by $1\pm \text{r}$, where the "down" variation is bounded at 0. For this case, $r = e_\text{tot} / n_\text{tot}$. Thus, in this bin, process $i$ contributes $n_i * (1 + \nu * r)$, where $\nu$ is the value of the (Gaussian-constrained) nuisance parameter. For all processes in this bin, we then obtain:

$$ \begin{align} n_i^\pm &= \sum_{i\in \text{bkg}} n_i * (1 \pm \nu * r) \\ &= (1 \pm \nu * r) * \sum_{i\in \text{bkg}} n_i \\ &= (1 \pm \nu * e_\text{tot} / n_\text{tot}) * n_\text{tot} \\ n_i^\pm &= n_\text{tot} \pm \nu * e_\text{tot} \end{align} $$

Since $e_\text{tot} = \sqrt{\sum_i e_i^2}$, we get the combined MC stat uncertainty in this given bin. And, since $\nu$ is shared for all processes in this bin, a single nuisance parameter controls the yields of all processes in this bin.

Usage

The user does not have to do anything special, I've enabled 2DA to use "autoMCstats" by default. The method can be turned off, its threshold modified, or signals included in the calculation, by modifying the JSON:

    "OPTIONS": {
        "mcstats": true,
        "mcstats_threshold": 10, 
        "mcstats_include_signal": false
    }

Other than that, no change needs to be made to the user's 2DAlphabet python script or JSON config file. Note that this will produce at least N * R * 2 new histograms, where N is the number of processes, R is the number of regions (pass, fail, etc), and 2 is for up/down. There may be more histograms if the BB-lite threshold is not met in a given bin.

Testing

I've tested this with some simple dummy histograms simulating $X \to H(b\bar{b}) Y(b\bar{b})$ in a Fail and Pass region and 2D histograms of data, a single signal, and Zbb and Hbb backgrounds. The testing workspace can be recreated with the following files:

test.tar.gz

The tarball contains:

an input ROOT file containing the data, Zbb+Hbb backgrounds, and one signal. The backgrounds and signal are subject to two shape uncertainties.
a JSON file for 2DAlphabet defining the pass and fail regions, all processes, and the two shape systematics. The JSON also contains an OPTIONS section that customizes the mcstats options
a python script to create the 2DAlphabet workspace with a few different transfer function parameterizations

Testing results

By default, I have 2DAlphabet give verbose output when creating the mcstat nuisances. The output is more or less copied entirely from combine. Here's the example output when creating the workspace:

(twoD-env) [ammitra@cmslpc331 dummy_test]$ python create_workspace.py 
Doing GLOBAL variable replacement in input json...
Running with options: Namespace(verbosity=0, overwrite=False, debugDraw=False, blindedPlots=[], blindedFit=[], mcstats=True, mcstats_threshold=10, mcstats_include_signal=False, haddSignals=True, plotTitles=False, plotTemplateComparisons=False, plotPrefitSigInFitB=False, plotEvtsPerUnit=False, year=1)

================================================================
Analyzing bin errors for region: fail
Poisson cut-off: 10
Processes summed: Zbb Hbb
Processes excluded from sums: XHY-1800-100
================================================================
Bin                 Contents           Error  Notes
----------------------------------------------------------------
fail (1, 1)         6.000000        2.449490  total sum
fail (1, 1)         6.000000        2.449490  excluding marked processes
fail (1, 1)         6.000000        2.449490  Effective events, alpha=1.000000
  => N_eff <= 10 : per-process (Poisson approximated as gaussian)
  ----------------------------------------------------------
    Zbb                      6.000000       2.449490
                             6.000000       2.449490  Effective events, alpha=1.000000
      => 'mcstat_fail_Zbb_bx1_by1' [1.00,0.00,3.86] to be gaussian constrained (width=0.408248)
  ----------------------------------------------------------
    Hbb                      0.000000       0.000000
      => Content or error is zero, ignore
----------------------------------------------------------------
fail (1, 2)        51.000000        7.141428  total sum
fail (1, 2)        51.000000        7.141428  excluding marked processes
fail (1, 2)        51.000000        7.141428  Effective events, alpha=1.000000
  => N_eff > 10 : shared gaussian shape 'mcstat_fail_bx1_by2' (rel=0.140028) shared by [Zbb, Hbb]
----------------------------------------------------------------
...

One can already see two examples of the MC stat functionality working for cases where (1) the $n_\text{tot}^\text{eff}$ is above threshold and (2) it is below the threshold. In the former case a shared parameter is created and in the latter separate parameters are created per-process.

One can also see that signal is excluded from this calculation and from MC statistical uncertainties, as per the JSON options.

We can also look at the post-fit results for the fits with and without MC statistical uncertainties, using a constant transfer function for the QCD estimate. All other fit options are the same between the two cases. In the former, the pulls are noticeably smaller, indicating that the fit is making use of the additional freedom provided by the MC statistical uncertainty parameters.

mX projection:

mY projection:

mroguljic · 2026-06-03T11:53:01Z

Looks good, thanks for implementing it (so quickly as well!). My comments are just questions for clarification.

Could you open a PR to the gitlab repo so we get the unit tests done?
Your test example looks nice. We should add it to the unit tests.
See the comment about alpha threshold. This could significantly reduce the number of created variation histogram

mroguljic · 2026-06-03T11:47:07Z

+                        if verbose:
+                            print('  => N_eff > %d : shared gaussian shape %r (rel=%.6f) shared by [%s]'
+                                % (threshold, variation, rel, ', '.join(nominal.keys())))
+                        for p in nominal:


If we set a condition for calling _make_mcstat_template to something like alpha < alpha_min, it would prevent the creation of, and adding to the ledger, variation histograms for nuisances that have negligible impact. Correct?

That is correct, but I thought about this while writing the implementation and I'm not sure we can use alpha itself as a criteria for dropping bins from the MC statistical uncertainty treatment.

alpha is defined as

$$ \begin{align} \alpha &= n_\text{tot} / N_\text{tot}^\text{eff} \\ & = n_\text{tot} / (n_\text{tot}^2 / e_\text{tot}^2) \\ \alpha &= e_\text{tot}^2 / n_\text{tot} \end{align} $$

So we have:

alpha = e**2/n_tot = w (the per-event weight) N_eff = n_tot/alpha (effective number of MC events)

I can't immediately see a sensible value for alpha_min that would allow us to gauge the impact of a process. What do you think ?

I think a better treatment than using alpha on its own would be to compare the MC background processes to the data-driven background process. Then one would ignore bins in which the MC backgrounds contribute to the total background by a much smaller fraction than the data-driven background. This would then govern the "impact" of the MC process. What do you think?

At alpha= 1/x, we have x times more MC events than their yield. The MC stat. uncert. is sqrt(x) times lower than the expected Poisson uncertainty of observed yield from these processes. At sufficiently high (low) x (alpha), MC stat. uncert. becomes negligible.

Moreover, most of our backgrounds typically comes from the data-driven estimate. The expected Poisson uncertainty of the observed data then includes a contribution from that as well. The threshold then depends on the analysis.

…ram creation

first attempt at mcstats implementation

101e996

mroguljic reviewed Jun 3, 2026

View reviewed changes

ammitra added 2 commits June 3, 2026 10:09

Allow processes to be excluded from MC statistical uncertainty histog…

fe70c70

…ram creation

warnings when excluding all processes from mcstat clac

16a7164

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

first attempt at mcstats implementation#12

first attempt at mcstats implementation#12
ammitra wants to merge 3 commits into
masterfrom
mcstats

ammitra commented Jun 3, 2026

Uh oh!

mroguljic commented Jun 3, 2026

Uh oh!

Uh oh!

Uh oh!

mroguljic Jun 3, 2026

Uh oh!

ammitra Jun 3, 2026

Uh oh!

mroguljic Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ammitra commented Jun 3, 2026

Usage

Testing

Testing results

Uh oh!

mroguljic commented Jun 3, 2026

Uh oh!

Uh oh!

Uh oh!

mroguljic Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

ammitra Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

mroguljic Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants